7 research outputs found

    Human face recognition under degraded conditions

    Get PDF
    Comparative studies on the state of the art feature extraction and classification techniques for human face recognition under low resolution problem, are proposed in this work. Also, the effect of applying resolution enhancement, using interpolation techniques, is evaluated. A gradient-based illumination insensitive preprocessing technique is proposed using the ratio between the gradient magnitude and the current intensity level of image which is insensitive against severe level of lighting effect. Also, a combination of multi-scale Weber analysis and enhanced DD-DT-CWT is demonstrated to have a noticeable stability versus illumination variation. Moreover, utilization of the illumination insensitive image descriptors on the preprocessed image leads to further robustness against lighting effect. The proposed block-based face analysis decreases the effect of occlusion by devoting different weights to the image subblocks, according to their discrimination power, in the score or decision level fusion. In addition, a hierarchical structure of global and block-based techniques is proposed to improve the recognition accuracy when different image degraded conditions occur. Complementary performance of global and local techniques leads to considerable improvement in the face recognition accuracy. Effectiveness of the proposed algorithms are evaluated on Extended Yale B, AR, CMU Multi-PIE, LFW, FERET and FRGC databases with large number of images under different degradation conditions. The experimental results show an improved performance under poor illumination, facial expression and, occluded images

    Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

    Full text link
    Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0\%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.Comment: 9 pages (1 for reference); accepted by the 6th Multimodal Learning and Applications Workshop (MULA) at CVPR 202

    PWD-3DNet: A Deep Learning-Based Fully-Automated Segmentation of Multiple Structures on Temporal Bone CT Scans

    Get PDF
    The temporal bone is a part of the lateral skull surface that contains organs responsible for hearing and balance. Mastering surgery of the temporal bone is challenging because of this complex and microscopic three-dimensional anatomy. Segmentation of intra-temporal anatomy based on computed tomography (CT) images is necessary for applications such as surgical training and rehearsal, amongst others. However, temporal bone segmentation is challenging due to the similar intensities and complicated anatomical relationships among critical structures, undetectable small structures on standard clinical CT, and the amount of time required for manual segmentation. This paper describes a single multi-class deep learning-based pipeline as the first fully automated algorithm for segmenting multiple temporal bone structures from CT volumes, including the sigmoid sinus, facial nerve, inner ear, malleus, incus, stapes, internal carotid artery and internal auditory canal. The proposed fully convolutional network, PWD-3DNet, is a patch-wise densely connected (PWD) three-dimensional (3D) network. The accuracy and speed of the proposed algorithm was shown to surpass current manual and semi-automated segmentation techniques. The experimental results yielded significantly high Dice similarity scores and low Hausdorff distances for all temporal bone structures with an average of 86% and 0.755 millimeter (mm), respectively. We illustrated that overlapping in the inference sub-volumes improves the segmentation performance. Moreover, we proposed augmentation layers by using samples with various transformations and image artefacts to increase the robustness of PWD-3DNet against image acquisition protocols, such as smoothing caused by soft tissue scanner settings and larger voxel sizes used for radiation reduction. The proposed algorithm was tested on low-resolution CTs acquired by another center with different scanner parameters than the ones used to create the algorithm and shows potential for application beyond the particular training data used in the study

    An Explainable Attention Zone Estimation for Level 3 Autonomous Driving

    No full text
    Accurately assessing the driver’s situational awareness is crucial in level 3 ( L3L_{3} ) autonomous driving, where the driver is in the loop. Estimating the attention zone provides essential information about the drivers’ on/off-road visual attention and determines their readiness to take over the control from the autonomous agent in complicated situations. This paper proposes a double-phase pipeline to improve the explainability and accuracy of the attention zone estimation using an intermediate gaze regression layer, where the true relationships between the input images and output zone labels are interpretable. The proposed GazeMobileNet, a lightweight deep neural network, in the first phase, achieved state-of-the-art performance in estimating the gaze vector in the MPIIGaze dataset, with MAE of 2.37 degrees. The model was used to extract the corresponding gaze vectors from the LISA V2, which is a driving dataset with the in-cabin attention zone labels. As LISA V2 does not contain gaze vector labels, an unsupervised clustering approach was proposed in the second phase to categorize the driver’s gaze vectors and map them to the corresponding attention zones. The proposed method demonstrated improved accuracy and robustness in the zone classification task. This model achieved the accuracies of 75.67% and 83.08% for attention zone estimation under “daytime without eyeglasses” and “nighttime without eyeglasses” capture conditions, respectively. Furthermore, the proposed model surpassed the recent research on that dataset by 73.11% and 74.02% accuracies under the “daytime with eyeglasses” and “nighttime with eyeglasses” capture conditions, respectively
    corecore